A Faster Index Algorithm and a Computational Study for Bandits with Switching Costs

نویسنده

  • José Niño-Mora
چکیده

We address the intractable multi-armed bandit problem with switching costs, for which Asawa and Teneketzis introduced in [M. Asawa and D. Teneketzis. 1996. Multi-armed bandits with switching penalties. IEEE Trans. Automat. Control, 41 328–348] an index that partially characterizes optimal policies, attaching to each project state a “continuation index” (its Gittins index) and a “switching index.” They proposed to jointly compute both as the Gittins index of a project with 2n states — when the original project has n states — resulting in an eight-fold increase in O(n) arithmetic operations relative to those to compute the continuation index. We present a faster decoupled computation method, which in a first stage computes the continuation index and then, in a second stage, computes the switching index an order of magnitude faster in at most n + O(n) arithmetic operations, achieving overall a four-fold reduction in arithmetic operations and substantially reduced memory operations. The analysis exploits the fact that the Asawa and Teneketzis index is the marginal productivity index of the project in its restless reformulation, using methods introduced by the author. Extensive computational experiments are reported, which demonstrate the dramatic runtime speedups achieved by the new algorithm, as well as the near-optimality of the resultant index policy and its substantial gains against the benchmark Gittins index policy across a wide range of randomly generated twoand three-project instances.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Linear Programming Relaxation and a Heuristic for the Restless Bandit Problem with General Switching Costs

We extend a relaxation technique due to Bertsimas and Niño-Mora for the restless bandit problem to the case where arbitrary costs penalize switching between the bandits. We also construct a one-step lookahead policy using the solution of the relaxation. Computational experiments and a bound for approximate dynamic programming provide some empirical support for the heuristic.

متن کامل

A branch and bound algorithm to minimize the total weighted number of tardy jobs and delivery costs with late deliveries for a supply chain scheduling problem

In this paper, we study a supply chain scheduling problem that simultaneously considers production scheduling and product delivery.  jobs have to be scheduled on a single machine and delivered to  customers for further processing in batches. The objective is to minimize the sum of the total weighted number of tardy jobs and the delivery costs. In this paper, we present a heuristic algorithm (HA...

متن کامل

Near-Minimum-Time Motion Planning of Manipulators along Specified Path

The large amount of computation necessary for obtaining time optimal solution for moving a manipulator on specified path has made it impossible to introduce an on line time optimal control algorithm. Most of this computational burden is due to calculation of switching points. In this paper a learning algorithm is proposed for finding the switching points. The method, which can be used for both ...

متن کامل

Marginal productivity index policies for scheduling restless bandits with switching penalties

We address the dynamic scheduling problem for discrete-state restless bandits, where sequence-independent setup penalties (costs or delays) are incurred when starting work on a project. We reformulate such problems as restless bandit problems without setup penalties, and then deploy the theory of marginal productivity indices (MPIs) and partial conservation laws (PCLs) we have introduced and de...

متن کامل

Hybrid Meta-heuristic Algorithm for Task Assignment Problem

Task assignment problem (TAP) involves assigning a number of tasks to a number of processors in distributed computing systems and its objective is to minimize the sum of the total execution and communication costs, subject to all of the resource constraints. TAP is a combinatorial optimization problem and NP-complete. This paper proposes a hybrid meta-heuristic algorithm for solving TAP in a ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • INFORMS Journal on Computing

دوره 20  شماره 

صفحات  -

تاریخ انتشار 2008